Chapter 4: Collections

Taxonomy

We have already encountered some collections in the previous chapter – arrays, tuples, dicts and sets.

This chapter examines what we can do with collections and how to best use them for maximum effectiveness. A collection may be indexable, associative or neither. This refers primarily to the way we access individual elements within the collection.

  • Indexable: An indexable collection is like a shopping list – the only way to identify individual elements is by pointing out their position. If you want to refer to, say, 1 pint of milk, you refer to it as the fifth entry on my shopping list.
    • Elements of an indexable collection are accessed using the square bracket notation collection[index], e.g. shopping_list[5]. Unusually for most programming languages and in sharp contrast to other languages like Python, indices begin with 1, rather than 0.
    • An indexable collection is also only equal to a collection with the same elements if they are in the same order – thus [1, 3, 5, 7] == [3, 7, 1, 5] yields, as one would expect, false.

In [14]:
shopping_list = ["eggs","flour","jam","syrup","milk"]


Out[14]:
5-element Array{ASCIIString,1}:
 "eggs" 
 "flour"
 "jam"  
 "syrup"
 "milk" 

In [15]:
shopping_list[5]


Out[15]:
"milk"

In [12]:
backup_shopping_list = sort(shopping_list,rev = true)

backup_shopping_list == shopping_list


Out[12]:
false
  • Associative: These collections, on the other hand, resemble a page from a phone book instead (if any of you actually still remember what one of those things is!). You wouldn't say that your phone number is on page 217, left column, fifth from the bottom. Rather, you would have a key (your name), by reference to which someone can find your phone number (the value).
    • Associative arrays do follow the key-value pair form. They are, therefore, not accessed in the collection[index] form but rather in the collection[key] form.
    • An associative collection is fundamentally not indexable, therefore the order of entries does not matter: two associative collections will be equal as long as they contain the same key-value pairs, order regardless.

In [82]:
phonebook = Dict("Mike" => 5551932, "Jane" => 5551933, "John" => 55519324) # Dictionaries use "" exclusively


Out[82]:
Dict{ASCIIString,Int64} with 3 entries:
  "John" => 55519324
  "Mike" => 5551932
  "Jane" => 5551933

In [83]:
phonebook["Mike"] # This wouldn't work with 'Mike'


Out[83]:
5551932

In [84]:
phonebook[1] # Raises error since it's an associative collection; no index


LoadError: KeyError: 1 not found
while loading In[84], in expression starting on line 1

 in getindex at dict.jl:738
  • Neither: Collections that are neither associative nor indexable are a somewhat complex case; sets are the only frequently used collection that is neither associative nor indexable.
    • They are also special because of the uniqueness constraint, that is, a set may contain each value once and only once. A set does not support the commonly used methods of access, but it does support many of the collection manipulation functions, such as push!() and pop!(). The latter, in case you were wondering, returns items in a random order, since the absence of an index means sets are not ordered; you could pop!() any value off a set.
    • Sets are not indexable, consequently two sets that contain the same elements will be considered equal, order regardless.

In addition, collections may be mutable or non-mutable, which references their ability to be changed. Put quite simply, a mutable collection is one where you can change particular values after creation, while in an immutable collection, you cannot do so.

The typical immutable collections are, of course, tuples – once you have assigned a value to a tuple, you cannot change that value (although you can assign some other tuple or indeed an entirely different value to the variable that holds the tuple).

Sets again represent a special case – they are what could be best described as pseudo-immutable – there is no way to access values in a way that could change them, since you normally access an element of a set by its value (which is sufficient in a set thanks to the uniqueness constraint).

Mutable Immutable
Indexable Arrays Tuples
Associative Dicts
Non-indexable and non-associative Sets

Indexable Collections

Accessing

Elements of an indexable collection can be accessed using the square bracket notation, by their ordinal:


In [85]:
prime_array = [2, 3, 5, 7, 11]


Out[85]:
5-element Array{Int64,1}:
  2
  3
  5
  7
 11

In [86]:
prime_array[3]  # Get elements at index 3


Out[86]:
5

In Julia, a range of numbers is written as start:end or start:steps:end. You can use a range to access a range of elements:


In [87]:
prime_array[2:3] # Get elements at index 2 to 3, with the starting index being 1


Out[87]:
2-element Array{Int64,1}:
 3
 5

A range always returns a collection, even if it has the length 1.

This is exemplified by the difference between prime_array[3], the call we made above, and prime_array[3:3], which returns:


In [34]:
prime_array[3:3]


Out[34]:
1-element Array{Int64,1}:
 5

Which is not the same as the single value 5 we returned before from prime_array[3]:


In [35]:
prime_array[3:3] == prime_array[3]


Out[35]:
false

You can access the last element of an indexable collection using [end]:


In [36]:
prime_array[end] # Get me the last value


Out[36]:
11

Incidentally, end behaves like a number – so prime_array[end-1] returns the penultimate element of the collection.


In [37]:
prime_array[end - 2] # Get me the value 2 indexes before the last value


Out[37]:
5

Setting

If the indexable collection you are using is also mutable (e.g. an array), any of these methods will act as a pseudo-variable and will allow you to assign a value to it. Thus prime_array[end] = 12 would change the last element of prime_array to 12. You also aren't restricted to a single element: calling prime_array[2:4] = 0 would result in


In [42]:
prime_array


Out[42]:
5-element Array{Int64,1}:
  2
  3
  5
  7
 11

And, of course, you can use an array or another indexable collection to replace values:


In [43]:
prime_array[2:4] = [13,15,17] # Change values from index 3 to 4 to new primes


Out[43]:
3-element Array{Int64,1}:
 13
 15
 17

In [44]:
prime_array # Show me the new array


Out[44]:
5-element Array{Int64,1}:
  2
 13
 15
 17
 11

Unpacking

Indexable collections can unpack: that is, they can be assigned in a single line to as many distinct variables as they have elements.

This is a very convenient feature, and is much used in functional programming:


In [45]:
actors = ["Ian McKellen", "Martin Freeman", "Elijah Wood"] # Define an array of strings


Out[45]:
3-element Array{ASCIIString,1}:
 "Ian McKellen"  
 "Martin Freeman"
 "Elijah Wood"   

In [48]:
gandalf, bilbo, frodo = actors # Associate those strings to variables


Out[48]:
3-element Array{ASCIIString,1}:
 "Ian McKellen"  
 "Martin Freeman"
 "Elijah Wood"   

In [51]:
gandalf


Out[51]:
"Ian McKellen"

They don't even have to be the same length; we can associate as many variables as there are elements available:


In [59]:
a, b = actors

println(a)
println(b)


Ian McKellen
Martin Freeman

But you cannot have more; the collection does not wrap around:


In [58]:
a, b, c, d = actors

println(a)
println(b)


LoadError: BoundsError: attempt to access 3-element Array{ASCIIString,1}:
 "Ian McKellen"  
 "Martin Freeman"
 "Elijah Wood"   
  at index [4]
while loading In[58], in expression starting on line 1

 in indexed_next at tuple.jl:22

Unpacking can also be used to swap the contents of variables:


In [60]:
firstname = "Irving"
lastname = "Washington"


Out[60]:
"Washington"

In [62]:
firstname, lastname = lastname, firstname # Reassociate variables to other defitions in order


Out[62]:
("Washington","Irving")

In [63]:
lastname # Show the changed variable


Out[63]:
"Irving"

Modifying

push!, pop! and append!

push!, pop! and append! are popular functions used to work with collections.

Note: Functions with an exclamation mark ! indicate that they modify their argument. Consider the following:


In [92]:
arr = [5,4,6]
println("With sort, the old array is unchanged: $(arr), but the new array is  $(sort(arr))") # Create a copy of the array and sort it
println("With sort!, the old array is changed: $(sort!(arr)), and new array is  $(sort!(arr))") # Change the array by sorting it


With sort, the old array is unchanged: [5,4,6], but the new array is  [4,5,6]
With sort!, the old array is changed: [4,5,6], and new array is  [4,5,6]
push!

push! appends the value to the end of the collection.


In [104]:
array = [1,2,3,4]


Out[104]:
4-element Array{Int64,1}:
 1
 2
 3
 4

In [105]:
push!(array, 5) # Push the new value into the end of the collection


Out[105]:
5-element Array{Int64,1}:
 1
 2
 3
 4
 5
pop!

pop! takes the last element of the list, returns it and removes it from the collection.


In [106]:
pop!(array) # Pop the last value off the collection


Out[106]:
5

In [107]:
array # Do we have the original array again?


Out[107]:
4-element Array{Int64,1}:
 1
 2
 3
 4
append!

append!, somewhat unusually, adds the elements of a collection to the end of another collection:


In [108]:
array2 = [5,6,7]


Out[108]:
3-element Array{Int64,1}:
 5
 6
 7

In [109]:
append!(array, array2)


Out[109]:
7-element Array{Int64,1}:
 1
 2
 3
 4
 5
 6
 7

In [110]:
append!(array, 1) # append cannot take single values; use push instead


LoadError: MethodError: `append!` has no method matching append!(::Array{Int64,1}, ::Int64)
Closest candidates are:
  append!{T}(::Array{T,1}, !Matched::AbstractArray{T,1})
while loading In[110], in expression starting on line 1

In [111]:
append!(array, [1]) # append only takes arrays, even of length 1


Out[111]:
8-element Array{Int64,1}:
 1
 2
 3
 4
 5
 6
 7
 1
shift!

shift! and unshift! are the front equivalent of pop! and push!; they add or remove values from the front of a collection.

Similarly to pop!, shift! retrieves the first element of the collection and removes it from the collection (which shifts it):


In [112]:
array


Out[112]:
8-element Array{Int64,1}:
 1
 2
 3
 4
 5
 6
 7
 1

In [113]:
shift!(array) # Shift the first element off the array


Out[113]:
1
unshift!

Similarly to push!, unshift! puts an element to the front of the collection:


In [114]:
unshift!(array, 7) # Unshift the array by adding a new element to it's start


Out[114]:
8-element Array{Int64,1}:
 7
 2
 3
 4
 5
 6
 7
 1

Finding

There's a set of functions starting with find — such as find(), findfirst(), and findnext() — that you can use to get the index or indices of values within an indexable collection that match a specific value, or pass a test. These functions share three properties.

  1. Their result is the index or indices of the value sought or tested for, with the nth element's index being n, not n-1 as you might be used to from other languages (Since we start from 1).

  2. You can use these functions in two principal forms: you can test for a value or you can test for a function, in which case the results will be the values for which the function returns true.

  3. The find functions' way of telling you they haven't found anything is returning zero, since there is no element of index zero.

Let's try to find things within the following Array: primes_and_one = [1,2,3,5,7,11,13,17,19,23]


In [116]:
primes_and_one = [1,2,3,5,7,11,13,17,19,23]


Out[116]:
10-element Array{Int64,1}:
  1
  2
  3
  5
  7
 11
 13
 17
 19
 23
findfirst()

findfirst() finds the first occurrence of a value and returns its index (or zero):


In [118]:
findfirst(primes_and_one, 5) # Where is the first 5 in my array?


Out[118]:
4

As noted above, we can feed the find functions a function as well – it will return values for which the function would return a true value.

We have not really discussed functions, but the general idea should be familiar to you. A function of the form x -> x == 13 results in true of the value of x is 13 and false otherwise.

Let's try to see which prime number is the first to equal 13 (don't expect big surprises):


In [126]:
findfirst(x -> x == 13, primes_and_one) # find the first index that matches the function


Out[126]:
7

In [127]:
primes_and_one[7]


Out[127]:
13

In [128]:
findfirst(x -> x >= 3, primes_and_one) # If more than one value matches a function, returns index of the first match


Out[128]:
3

In [129]:
primes_and_one[3]


Out[129]:
3

You might have noticed that unlike in the case where you were searching for a particular value, where you're searching by a function, the function comes first (findfirst(array, value) vs. findfirst(function, array))

This is a little idiosyncrasy, but has to do with the way Julia determines what to do based on what it is provided with. A lot more of this will be explored in a later Chapter. .

find()

find() returns an array of results, not just the first match.

Thus, let's use the isinteger function to see which of our primes are integers (yet again, the result should not come as a shock):


In [138]:
find(isinteger, primes_and_one) # Show me all the indices of primes that are integers


Out[138]:
10-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10

In [139]:
find(x -> x >=3, primes_and_one) # Show me all the indices of primes that are greater than 3


Out[139]:
8-element Array{Int64,1}:
  3
  4
  5
  6
  7
  8
  9
 10

In [140]:
primes_and_one[find(x -> x >=3, primes_and_one)] # Wrap it up together and show me the values, not just the indices


Out[140]:
8-element Array{Int64,1}:
  3
  5
  7
 11
 13
 17
 19
 23
findnext()

findnext() returns results from a given index onwards.

Thus, if you want to know the index of the first odd number after index 3 in the list of primes, you would proceed as follows (using the function isodd, which, as you could guess, returns true for odd integers and false otherwise):


In [141]:
findnext(isodd, primes_and_one, 4) # Find me the next odd prime index from the 4th index onwards


Out[141]:
4

Wait, why 4? As you might remember, Julia is 1-indexed, not 0-indexed, and inclusive. Therefore, an index 'begins before' the number.

The number after the first index is the first number in the sequence and so on. As such, the number after the third item in a collection is the item next to (= following) the index 4, not 3.

You might also have noticed that when you use a function as an argument, you do not use the parentheses you would normally use to call a function. That is because function() means call this function while function is merely a reference to the function object itself.

Retrieving

So far, we've only been getting indices. How do we retrieve the actual elements? The answer is, of course, by using our magical [](square brackets) syntax. We'll also use this as a good opportunity to introduce a very useful function, isprime(), which returns true for primes and false otherwise:


In [285]:
find(isprime, primes_and_one) # What indices of this array are prime?


Out[285]:
9-element Array{Int64,1}:
  2
  3
  4
  5
  6
  7
  8
  9
 10

In [286]:
primes_and_one[find(isprime,primes_and_one)] # What are the values at the indices of this array that are prime?


Out[286]:
9-element Array{Int64,1}:
  2
  3
  5
  7
 11
 13
 17
 19
 23

Filtering

The filter() function works quite similar to find, except in this case returns only the elements that satisfy the condition (it is, effectively, a shorthand for the previous listing):


In [287]:
filter(isodd, primes_and_one) # What are the values of this array that are odd?


Out[287]:
9-element Array{Int64,1}:
  1
  3
  5
  7
 11
 13
 17
 19
 23

filter() can be used in-place, by using the ! after the name of the function (thereby changing the original variable).

Using filter!(), altering the actual array rather than returning a filtered copy. Note, however, that functions ending with ! modify the object, so, obviously immutable type they act on must be mutable – you would, therefore, not be able to filter!() a tuple, even though you would be able to filter() it.

Sorting

The sort() function sorts an array lexicographically, generally in an ascending order:


In [288]:
sort([-3, 2, 1, 7])


Out[288]:
4-element Array{Int64,1}:
 -3
  1
  2
  7

You can specify the sort criterion with a function using by – in this case, we will be using the absolute value function abs (remember not to use the parentheses symbolising function call, just the name of the function):


In [289]:
sort([-3,2,1,7], by=abs) # Sort this aray by absolute value


Out[289]:
4-element Array{Int64,1}:
  1
  2
 -3
  7

You can change the order of sorting using rev:


In [149]:
sort([-3,2,1,7], by=abs, rev=true) # Sort this array by absolute value and reverse it.


Out[149]:
4-element Array{Int64,1}:
  7
 -3
  2
  1

And, for the great joy of algorithm nerds, you can choose the sort algorithm using alg.

Here's a fun review of some sorting algorithms:

Julia currently supports three sorting algorithms (InsertionSort, QuickSort and MergeSort).


In [150]:
sort([-3,2,1,7], by=abs, rev=true, alg=MergeSort) # Sort the array by absolute value using MergeSort, and reverse it


Out[150]:
4-element Array{Int64,1}:
  7
 -3
  2
  1

For mutable indexable collections, such as arrays, you can use sort!(), which sorts 'in place'.

You can also sort non-numeric elements, or indeed anything for which the isless() function is defined, which sorting uses internally:


In [152]:
sort!(["Bayes", "Laplace", "Poisson", "Gauss"]) # Sort this array in place


Out[152]:
4-element Array{ASCIIString,1}:
 "Bayes"  
 "Gauss"  
 "Laplace"
 "Poisson"

Counting

count() tells you the number of instances in the collection that satisfy the criterion:


In [156]:
count(isodd, primes_and_one) # How many values in this array match the isodd function?


Out[156]:
9

Testing

all() and any() implement two of the mathematical concepts known as quantifiers, with all() representing the universal quantifier "for all" (∀), while any() implements the existential quantifier "for any" (∃x).

These functions test whether all or any, respectively, of a collection satisfies a certain criterion, and return a single truth value.


In [159]:
all(x -> x >= 3, [1:10]) # Does f(x) apply for all values in the defined range (x >= 3 ∀ {1:10})?


Out[159]:
false

In [189]:
any(x -> x >= 3, [1:10]) # Does f(x) apply for any values in the defined range (x >= 3 ∃x {1:10})?


Out[189]:
true

Checking

To find out whether an array has a particular value among its elements, you can use in():


In [168]:
in(3, primes_and_one) # Is there a 2 in this collection?


Out[168]:
true

Somewhat strangely, in the in() syntax, the needle comes before the haystack, i.e. in(value, array), where value denotes the value you are looking for, just like other function references.

Particulars

Arrays

Arrays (the ones we used in our examples so far in this section) are mutable indexable collections. The type Array{T,N} indicates an N-dimensional array which elements' types are subtypes of T.

For instance, Array{Number, 2} is an 2-dimensional array. Its elements' types descend from Number (e.g.Int, Float64).

Access in multidimensional arrays

How do we access elements in a multidimensional array, a special form of indexable collection? Simple – in a multidimensional array, indexes go down each row, then from left to right. Therefore, this array


In [170]:
md_array = ["A" "B"; "C" "D"]


Out[170]:
2x2 Array{ASCIIString,2}:
 "A"  "B"
 "C"  "D"

would be indexed as follows:


In [171]:
println(md_array[1])
println(md_array[2])
println(md_array[3])
println(md_array[4])


A
C
B
D

This is a little counterintuitive and different from the usual row/column notation, where you would use array[row][column].

In Julia, to retrieve a cell by row and column, use array[row, column]:


In [172]:
md_array[1,2]


Out[172]:
"B"

This generalizes for higher dimension arrays.

Tuples

A tuple is an ordered sequence of elements, like an array. A tuple is represented by parentheses and commas, rather than the square brackets used by arrays. The important difference between arrays and tuples is that tuples are immutable: you can't change the elements of a tuple, or the tuple itself, after creating it.

Tuples are generally used for small fixed-length collections — they're ubiquitous across Julia, for example as argument lists. Where a function returns multiple values, which, as we see, is pretty often the case, the result is a tuple.

A corollary of immutability is that none of the ! functions work on tuples - but at the same time, using functions such as push() is perfectly acceptable, since it returns a copy of the tuple with the added element, which does not alter the original tuple.

Associative Collections

An associative collection (typically Dictionaries) is a kind of non-indexed collection that stores (usually) pairs of values.

The indexable collections you have encountered correspond to real-life examples such as a shopping list or a sequential list of train stations. Associative collections, on the other hand, have a key and a value (for this reason, they are sometimes referred to as key-value pairs).

Julia, like many other programming languages, implements associative collections in an object called a dict (short for dictionary), which corresponds to 'maps', 'hash tables' or 'dictionaries' in other programming languages.

A dict, as we have seen, is usually created using the dict literal:


In [200]:
dict = Dict("a" => 1, "b" => 2, "c" => 3)


Out[200]:
Dict{ASCIIString,Int64} with 3 entries:
  "c" => 3
  "b" => 2
  "a" => 1

The key of a key-value pair is unique, meaning that while several keys might point at the same value (and a key might point at a collection as a value), you cannot have duplicate keys (in database terminology, you might have heard this referred to as a one-to-many relationship).

Creating

Other than the Dict() literal, there are three more ways to create a dict.

First, you can create a dict using the comprehension syntax for dicts. An example is


In [194]:
[i => sqrt(i) for i = 1:2:15] # The => defines the key -value relationship, and returns the result as a dict


Out[194]:
Dict{Int64,Float64} with 8 entries:
  7  => 2.6457513110645907
  9  => 3.0
  13 => 3.605551275463989
  3  => 1.7320508075688772
  11 => 3.3166247903554
  5  => 2.23606797749979
  15 => 3.872983346207417
  1  => 1.0

This creates a dict with the square root of every odd number from 1 to 15. In this case, i can be any iterable – while ranges are the most frequently used, there is no reason why.

The following would not be equally valid:


In [198]:
Dict(i => sqrt(i) for i = [2, 5, 6, 8, 12, 64]) # Dict only takes key pairs,


LoadError: syntax: missing comma or ) in argument list
while loading In[198], in expression starting on line 1

Note: Comprehension syntax is generally applicable to most collections:


In [197]:
[sqrt(i) for i = 1:2:15] # This is just list comprehesnion; the key-pair can be dropped to return an array


Out[197]:
8-element Array{Float64,1}:
 1.0    
 1.73205
 2.23607
 2.64575
 3.0    
 3.31662
 3.60555
 3.87298

Secondly, you can also create an empty dictionary. Dict() will construct an empty dictionary that permits any elements, while Dict{type1, type2}() will create an empty dictionary that permits any elements with keys of type1 and values of type2.

Finally, earlier versions of Julia used to support what is sometimes referred to as zip creation of a dict, namely entering two equal-length tuples, one for keys and one for values. This is now regarded as deprecated – it still works, but you should not use it. Instead, the correct syntax is Dict(zip(ks, vs)):


In [201]:
ks = ("a", "b", "c")
vs = ("1", "2", "3")


Out[201]:
("1","2","3")

In [202]:
Dict(zip(ks,vs))


Out[202]:
Dict{ASCIIString,ASCIIString} with 3 entries:
  "c" => "3"
  "b" => "2"
  "a" => "1"

Accessing

Just like items in an indexable arrays are keyed by their index, items in a dict are identified by their key and retrieved using the square bracket syntax with the relevant key:


In [205]:
statisticians = Dict("Gosset" => "1876-1937", "Pearson" => "1857-1936", "Galton" => "1822-1911")


Out[205]:
Dict{ASCIIString,ASCIIString} with 3 entries:
  "Galton"  => "1822-1911"
  "Pearson" => "1857-1936"
  "Gosset"  => "1876-1937"

In [206]:
statisticians["Gosset"]


Out[206]:
"1876-1937"

One drawback of the bracket syntax is that if there is no entry for the key provided, Julia will raise an error:


In [207]:
statisticians["Kendall"]


LoadError: KeyError: Kendall not found
while loading In[207], in expression starting on line 1

 in getindex at dict.jl:738

An alternative form of accessing a dictionary is using the get() function, which accepts a default value:

get()


In [214]:
get(statisticians, "Pearson", "I'm sorry, I don't know when this person lived.")


Out[214]:
"1857-1936"

In [215]:
get(statisticians, "Kendall", "I'm sorry, I don't know when this person lived.")


Out[215]:
"I'm sorry, I don't know when this person lived."

This is particularly helpful for key-error handling; any not found entries all inherit this default response.

Unlike in some other programming languages, a default is not optional for Julia:


In [216]:
get(statisticians, "Kendall")


LoadError: MethodError: `get` has no method matching get(::Dict{ASCIIString,ASCIIString}, ::ASCIIString)
Closest candidates are:
  get{K,V}(::Dict{K,V}, ::Any, !Matched::Any)
  get(!Matched::ObjectIdDict, ::ANY, !Matched::ANY)
  get{K}(!Matched::WeakKeyDict{K,V}, ::Any, !Matched::Any)
  ...
while loading In[216], in expression starting on line 1

Modifying

get!()

Because dicts are mutable, get!() can create an access a value by its key and return this new value.

Its syntax is identical to the get() idea of a default value:


In [211]:
get!(statisticians, "Kendall", "I'm sorry, I don't know when this person lived.")


Out[211]:
"I'm sorry, I don't know when this person lived."

In [212]:
statisticians


Out[212]:
Dict{ASCIIString,ASCIIString} with 4 entries:
  "Galton"  => "1822-1911"
  "Pearson" => "1857-1936"
  "Kendall" => "I'm sorry, I don't know when this person lived."
  "Gosset"  => "1876-1937"

pop!()

pop!() drops the key-value array matching the key.

If the key does not exist, it returns an optional default value or throws an error:


In [220]:
pop!(statisticians, "Kendall")


Out[220]:
"I'm sorry, I don't know when this person lived."

In [221]:
statisticians


Out[221]:
Dict{ASCIIString,ASCIIString} with 3 entries:
  "Galton"  => "1822-1911"
  "Pearson" => "1857-1936"
  "Gosset"  => "1876-1937"

Changing

To change a value, access it via the bracket syntax, then assign it the new value:


In [222]:
statisticians["Kendall"] = "1907-1983"


Out[222]:
"1907-1983"

In [231]:
statisticians


Out[231]:
Dict{ASCIIString,ASCIIString} with 4 entries:
  "Galton"  => "1822-1911"
  "Pearson" => "1857-1936"
  "Kendall" => "1907-1983"
  "Gosset"  => "1876-1937"

Checking

To check for the existence of a key without retrieving it, you can use haskey():


In [224]:
haskey(statisticians, "Galton") # Does this Dict have this key?


Out[224]:
true

In [230]:
haskey(statisticians, "Bayes")


Out[230]:
false

You can also check for the existence of a key-value pair in a dict using the in() function you might be familiar with from arrays using the associative array symbol =>:


In [242]:
statisticians


Out[242]:
Dict{ASCIIString,ASCIIString} with 4 entries:
  "Galton"  => "1822-1911"
  "Pearson" => "1857-1936"
  "Kendall" => "1907-1983"
  "Gosset"  => "1876-1937"

In [243]:
in(("Galton" => "1822-1911"), statisticians) # Is this key-pair in this Dict?


Out[243]:
true

In [244]:
in(("Bayes" => "1702-1761"), statisticians)


Out[244]:
false

Retrieving

To retrieve all keys of a dict, use keys().

This will retrieve an object of type KeyIterator, which does just what the name suggests - it iterates through the keys of an array. This will be useful later on when we want to iterate through the dictionary by keys:


In [245]:
keys(statisticians)


Out[245]:
Base.KeyIterator for a Dict{ASCIIString,ASCIIString} with 4 entries. Keys:
  "Galton"
  "Pearson"
  "Kendall"
  "Gosset"

You can retrieve values, predictably, by using the values() function:


In [246]:
values(statisticians)


Out[246]:
Base.ValueIterator for a Dict{ASCIIString,ASCIIString} with 4 entries. Values:
  "1822-1911"
  "1857-1936"
  "1907-1983"
  "1876-1937"

Sorting

You may have noticed that dicts are unordered. Even a dict generated by reference to a range, such as the one seen above, will not be in any particular order:


In [249]:
[i => sqrt(i) for i = 1:1:15]


Out[249]:
Dict{Int64,Float64} with 15 entries:
  2  => 1.4142135623730951
  11 => 3.3166247903554
  7  => 2.6457513110645907
  9  => 3.0
  10 => 3.1622776601683795
  8  => 2.8284271247461903
  6  => 2.449489742783178
  4  => 2.0
  3  => 1.7320508075688772
  5  => 2.23606797749979
  13 => 3.605551275463989
  14 => 3.7416573867739413
  15 => 3.872983346207417
  12 => 3.4641016151377544
  1  => 1.0

This is because dicts are not indexable, therefore there is no ordering that would make inherent sense.

However, sometimes, we like dictionaries sorted. Disappointingly, sorting dicts is not as easy as sorting arrays: sort(statisticians) tells us that 'sort' has no method matching sort(::Dict{ASCIIString,ASCIIString}).

Therefore, you have to write your own sort function that first converts statisticians from a dict into an array of 2-element tuples. This is because the sort() function has no defined methods for dicts, but it can sort arrays, including tuples, where it sorts by the first element in the tuple.

We can emulate this functionality with sort(collect(statisticians)) which converts this Dict into another (typically an Array:


In [261]:
sort(collect(statisticians))


Out[261]:
4-element Array{Pair{ASCIIString,ASCIIString},1}:
 "Galton"=>"1822-1911" 
 "Gosset"=>"1876-1937" 
 "Kendall"=>"1907-1983"
 "Pearson"=>"1857-1936"

We then wrap this back up in a Dict:


In [262]:
sort(collect(statisticians))


Out[262]:
4-element Array{Pair{ASCIIString,ASCIIString},1}:
 "Galton"=>"1822-1911" 
 "Gosset"=>"1876-1937" 
 "Kendall"=>"1907-1983"
 "Pearson"=>"1857-1936"

Merging

The function merge() merges two or more dicts.


In [263]:
mathematicians = Dict("Gauss" => "1777-1855", "Leibniz" => "1646-1716", "Abel" => "1802-1829")


Out[263]:
Dict{ASCIIString,ASCIIString} with 3 entries:
  "Abel"    => "1802-1829"
  "Leibniz" => "1646-1716"
  "Gauss"   => "1777-1855"

In [264]:
merge(mathematicians, statisticians)


Out[264]:
Dict{ASCIIString,ASCIIString} with 7 entries:
  "Abel"    => "1802-1829"
  "Galton"  => "1822-1911"
  "Leibniz" => "1646-1716"
  "Gauss"   => "1777-1855"
  "Pearson" => "1857-1936"
  "Kendall" => "1907-1983"
  "Gosset"  => "1876-1937"

Its bang counterpart, merge!(), merges them in place, overwriting the first dict mentioned while leaving the second intact.


In [265]:
merge!(mathematicians, statisticians)


Out[265]:
Dict{ASCIIString,ASCIIString} with 7 entries:
  "Abel"    => "1802-1829"
  "Galton"  => "1822-1911"
  "Leibniz" => "1646-1716"
  "Gauss"   => "1777-1855"
  "Pearson" => "1857-1936"
  "Kendall" => "1907-1983"
  "Gosset"  => "1876-1937"

In [266]:
mathematicians


Out[266]:
Dict{ASCIIString,ASCIIString} with 7 entries:
  "Abel"    => "1802-1829"
  "Galton"  => "1822-1911"
  "Leibniz" => "1646-1716"
  "Gauss"   => "1777-1855"
  "Pearson" => "1857-1936"
  "Kendall" => "1907-1983"
  "Gosset"  => "1876-1937"

In [267]:
statisticians


Out[267]:
Dict{ASCIIString,ASCIIString} with 4 entries:
  "Galton"  => "1822-1911"
  "Pearson" => "1857-1936"
  "Kendall" => "1907-1983"
  "Gosset"  => "1876-1937"

Non-Indexable & Non-Associative Collections

Sets are the primary collection of this type.

You might be familiar with the idea of sets from maths/set theory. A set is a non-indexable, non-associative and non-mutable collection that also has unique elements. No element may occur twice, so an element's value identifies it conclusively.

Creating

To create a set, use the Set() constructor function. You can create a set that accepts any data type


In [269]:
primes_set = Set()


Out[269]:
Set{Any}()

Or you can specify what sort of data types it would accept:


In [273]:
primes_set = Set{Int64}()


Out[273]:
Set{Int64}()

You can create and fill sets in one go by listing elements surrounded by curly braces {}, and if you surround the elements with square brackets [] instead of curly braces {} Julia will guess the type:

mersenne_primes_set = Set([3, 7, 31, 127])

Operating

Sets have some unique functions that accommodate certain problems well-known from set theory: the functions union(), intersect() and setdiff() each, respectively, implement the union, intersection and difference of sets. They are also immutable, so they cannot be changed; they can only generate new sets.

Let's see how we can use this to find some similarities between the cast of two blockbusters, The Lord of the Rings and The Matrix.

First, let's create two sets with some actors from each movie:


In [290]:
lotr_actors = Set(["Elijah Wood", "Ian McKellen", "Viggo Mortensen", "Hugo Weaving"])
matrix_actors = Set(["Keanu Reeves", "Lawrence Fishburne", "Hugo Weaving"])


Out[290]:
Set(ASCIIString["Keanu Reeves","Hugo Weaving","Lawrence Fishburne"])

To find shared actors, we can use intersect():


In [291]:
intersect(lotr_actors, matrix_actors)


Out[291]:
Set(ASCIIString["Hugo Weaving"])

To find actors who only starred in Lord of the Rings but not in The Matrix, we can use setdiff(), which shows all elements that are in the first Set but not the second:


In [280]:
setdiff(lotr_actors, matrix_actors)


Out[280]:
Set(ASCIIString["Elijah Wood","Ian McKellen","Viggo Mortensen"])

Finally, we can see actors who played in either of the movies, by using union():


In [281]:
union(lotr_actors, matrix_actors)


Out[281]:
Set(ASCIIString["Elijah Wood","Ian McKellen","Hugo Weaving","Keanu Reeves","Lawrence Fishburne","Viggo Mortensen"])

Collections & Types

Until now, we have generally created collections using literals, and with precious little regard to the types of information that go in them. While types will be discussed in quite a bit of detail later on, what we do know about types is that they are individual categories of data.

Julia operates what is called type inference: unless you tell it explicitly what type something is, it tries to figure it out best as it can. We see this in operation when we create a new collection. When a collection is created and Julia is not told that this is going to be a collection containing elements of only a particular kind or particular kinds of values, it makes an educated guess. The REPL tells us this much:


In [292]:
mersenne_primes = [3, 7, 31, 127, 8191, 131071]


Out[292]:
6-element Array{Int64,1}:
      3
      7
     31
    127
   8191
 131071

Upon creating the array, the REPL reports to us that it's an array consisting of six elements, all of type Int64 – a type of signed 64-bit integer (don't worry if that means quite little to you just yet, we will be discussing various integer types in Chapter [X]). It also, helpfully, reports to us that we've got a 1-dimensional array.

Type inference and dissimilar types

What, however, if I want to play it a little wild and mix it up? Consider the following array:


In [293]:
not_really_mersenne_primes = [3, 7, "potato", 127, π, "wallaby"]


Out[293]:
6-element Array{Any,1}:
   3                    
   7                    
    "potato"            
 127                    
  π = 3.1415926535897...
    "wallaby"           

As you have guessed, Julia is at a loss as to what to do with this, since we've got a mix of integers, strings and a constant thrown in for good measure. Therefore, it tells us that it has inferred the type of the collection to be Any – a type that applies to all objects.

Type inference and empty collections

The other marginal case is that of the empty set. Julia has a dedicated type, None – a subtype of Any – that applies to the empty set:


In [284]:
empty_set = []


Out[284]:
0-element Array{Any,1}

Conclusion

In this chapter, we learned about the way Julia's handles collections, whether they are Indexable, Associative, or neither. An indexable array is referenced by its index value (starting from 1 in Julia), and provides a bunch of operations for changing, setting, modifying, finding, and even testing these collections. Associative arrays are based on the notion of key-value pairs, and are almost always dictionaries. They do not have an order, and rely on the notion of retrieving a value for a given key, not indices. Sets are neither indexable nor associative; they are a collection that deals exclusvily with unique values. In any case, they all share one important property – to act as 'envelopes' for multiple elements, each with their distinct type.